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Abstract — This work refers to moderate-deviations analysis of 
binary hypothesis testing. It relies on a concentration inequality 
for discrete-parameter martingales with bounded jumps, which 
forms a refinement to the Azuma-Hoeffding inequality. Relations 
of the analysis to the moderate deviations principle for i.i.d. 
random variables and the relative entropy are considered. 

Index Terms — Concentration inequalities, hypothesis testing, 
moderate deviations principle. 



I. Introduction 

The moderate deviations analysis in the context of source 
and channel coding has recently attracted some interest among 
information theorists (see (T], f4], fTTl, fT6l, fT9l and f22]). 
The purpose of this paper is to consider moderate deviations 
analysis for binary hypothesis testing. 

In the following, related literature on moderate deviations 
analysis in information-theoretic aspects is shortly reviewed. 
Moderate deviations were analyzed in [1 Section 4.3] for 
a channel model that gets noisier as the block length is 
increased. Due to the dependence of the channel parameter 
in the block length, the usual notion of capacity for these 
channels is zero. Hence, the issue of increasing the block 
length for the considered type of degrading channels was 
examined in [1, Section 4.3] via moderate deviations analysis 
when the number of codewords increases sub-exponentially 
with the block length. In another recent work [4], the moderate 
deviations behavior of channel coding for discrete memoryless 
channels was studied by Altug and Wagner with a derivation 
of direct and converse results which explicitly characterize the 
rate function of the moderate deviations principle (MDP). In 
||4], the authors studied the interplay between the probability 
of error, code rate and block length when the communication 
takes place over discrete memoryless channels, having the 
interest to figure out how the decoding error probability of 
the best code scales when simultaneously the block length 
tends to infinity and the code rate approaches the channel 
capacity. The novelty in the setup of their analysis was the 
consideration of the scenario mentioned above, in contrast 
to the case where the rate is kept fixed below capacity, and 
the study is reduced to a characterization of the dependence 
between the two remaining parameters (i.e., the block length n 
and the average/ maximal error probability of the best code). 
As opposed to the latter case when the code rate is kept 
fixed, which then corresponds to large deviations analysis 
and characterizes the error exponents as a function of the 
rate, the analysis in (via the introduction of direct and 
converse theorems) demonstrated a sub-exponential scaling 
of the maximal error probability in the considered moderate 



deviations regime. This work was followed by a work by 
Polynaskiy and Verdu where they show that a DMC satisfies 
the MDP if and only if its channel dispersion is non-zero, and 
also that the AWGN channel satisfies the MDP with a constant 
that is equal to the channel dispersion. The approach used in 
[4] was based on the method of types, whereas the approach 
used in [17| borrowed some tools from a recent work by the 
same authors in |16|. 

In ifTTI . the moderate deviations analysis of the Slepian- 
Wolf problem for lossless source coding was studied. More 
recently, moderate deviations analysis for lossy source coding 
of stationary memoryless sources was studied in 1221 . 

These works, including this paper, indicate a recent interest 
in moderate deviations analysis in the context of information- 
theoretic problems. In the literature on probability theory, the 
moderate deviations analysis was extensively studied (see, e.g., 
ifTOl Section 3.7]), and in particular the MDP was studied in 
||9l for continuous-time martingales with bounded jumps. 

This paper has the following structure: Section HIl introduces 
briefly some preliminary material related to martingales and 
Azuma's inequality. It then follows by introducing a refined 
version of Azuma's inequality, and a study of its relation to 
the moderate deviations principle for i.i.d. random variables. 
Section|III]considers the relation of Azuma's inequality and the 
refined version of this inequality (from Section to moderate 
deviations analysis of binary hypothesis testing. Section |IV] 
concludes the paper, followed by a discussion on the MDP 
that is relegated to an appendix. 

II. Concentration and Its Relation to the 
Moderate Deviations Principle 

We present here some essential material that is related to 
the martingale approach used in this paper for the moderate- 
deviations analysis of binary hypothesis testing. A background 
on martingales is provided in, e.g., |23J where we only rely 
here on basic knowledge on martingales. 

A. Azuma 's Inequality 

Azuma's inequalitjQ forms a useful concentration inequality 
for bounded-difference martingales [S). In the following, this 
inequality is introduced. The reader is referred to, e.g., f6l and 
1 15 1 for surveys on concentration inequalities for martingales 
(including a proof of this inequality). 

Theorem 1: [Azuma's inequality] Let {Xk,J-k\k^=a 
a discrete-parameter real-valued martingale sequence (where 

'Azuma's inequality is also known as the Azuma-Hoeffding inequality. It 
will be named from this point as Azuma's inequality for the sake of brevity. 



J-Q C Ti C . . . is called a filtration). Assume that for every 
A; G N, the condition \Xk — Xk-i\ < dk holds a.s. for some 
non-negative constants {dk}^i- Then 



Let ?7 G 1) be an arbitrary fixed number, and let {a„}^j^ 
be the non-negative sequence 



"(|^n-^o|>r)<2exp - 



Vr > 0. (1) so that a„ ^ and na„ cx) as rt — !■ cxi. Let a G 



and 



The concentration inequality stated in Theorem [T] was 
proved in lfT2l for independent bounded random variables, and 
it was later derived in ||5) for bounded-difference martingales. 

B. A Refined Version of Azuma's Inequality 

Theorem 2: Let {X^, J-k}kL(i ^ discrete-parameter real- 
valued martingale. Assume that, for some constants d, cr > 0, 
the following two requirements are satisfied a.s. 

\Xk — Xk-i \ < d, 

Var(Xfc| = E[(Xfe - Xk-if \ Tk-i] < <J^ 

for every k G {I, . . . ,n}. Then, for every a > 0, 



\Xn — Xo\ > an) < 2 exp 



-nD 



(5 + 7 



1 + 7 



7 



1 + 7 



where 



^^d 



(2) 



(3) 



and D{p\\q) ^p\n(^) + (1 - p) ln(ief ) for p, g G [0,1] 
is the divergence (a.k.a. relative entropy or Kullback-Leibler 
distance) between the two probability distributions {p, I — p) 
and (q, 1 ~ q). If S > 1, then the probability on the left-hand 
side of (|2]i is equal to zero. 

Proof: See Oil, (TOl Corollary 2.4.7] or [19, Section III]. 



C. Relation of Theorem |2] with the Moderate Deviations 
Principle for i.i.d. RVs 

According to the moderate deviations theorem (see, e.g., 
ifTOi Theorem 3.7.1]) in M, let {X;}"^]^ be a sequence of i.i.d. 
real-valued RVs such that Ajs:(A) = E[e^^'] < oo in some 
neighborhood of zero, and also assume that E[Xi] = and 
cr^ = Yai{Xi) > 0. Let {an}^=i be a non-negative sequence 
such that a„ — > and na„ — > cx) as n — > oo, and let 



* 1=1 



Vn G N. 



(4) 



Then, for every measurable set F C M, 

-— — mt X 
2cr2 xGTO 

< liminf a„lnP(Z„ G T) 

n-->oo 

< limsupa„lnP(Z„ G T) 



< -- 



1 



2a2 ; 



inf x'^ 



Gr 



(5) 



where r° and F designate, respectively, the interior and closure 
sets of F. 



F = (-00, -a] U [a, oo). Note that, from dUl, 

> an" j = P(Z„ G F) 
so from the moderate deviations principle (MDP) 
lim InP f I V A:J > an'' I 




1=1 



a 
'2^' 



Va>0. (6) 



It is demonstrated in Appendix lAl that, in contrast to Azuma's 
inequality, Theorem|2]gives an upper bound on the probability 
P X;r=i ^^ > (where n G N and a > 0) which 

coincides with the exact asymptotic limit in (|6]l. The analysis 
in Appendix |A] provides another interesting link between 
Theorem |2] and a classical result in probability theory, which 
also emphasizes the significance of the refinements of Azuma's 
inequality. 

III. Moderate Deviations Analysis for Binary 
Hypothesis Testing 

Binary hypothesis testing for finite alphabet models was 
analyzed via the method of types, e.g., in ITj Chapter 11] and 
lis). It is assumed that the data sequence is of a fixed length 
(n), and one wishes to make the optimal decision based on 
the received sequence and the Neyman-Pearson ratio test. 

Let the RVs Xi,X2-.-- be i.i.d. ^ Q, and consider two 
hypotheses: 

. Hi:Q = Pi. 

. H2:Q = P2. 
For the simplicity of the analysis, let us assume that the RVs 
are discrete, and take their values on a finite alphabet X where 
Pi{x),P2ix) > for every x e X. 

In the following, let 



Hx,,. 



In 



PljXi, . . . , Xn) 
P^{Xi,...,Xn) 



i=l 



P2{X,) 



designate the log-likelihood ratio. By the strong law of large 
numbers (SLLN), if hypothesis Hi is true, then a.s. 

lim ^^^^'■•■'^"^ =m||P2) (7) 

n— >oo n 

and otherwise, if hypothesis H2 is true, then a.s. 

lim ^^^^'•■■'^"^ =-i^(P2||Pi) (8) 

n— >oo n 

where the above assumptions on the probability mass functions 
Pi and P2 imply that the relative entropies, £'(Pi||P2) and 
D{P2\\Pi), are both finite. Consider the case where for some 
fixed constants A, A G K. that satisfy 

-D{P2\\Pi) < X<\ < D{Pi\\P2) 

one decides on hypothesis Hi if L{Xi, . . . > nA, and 

on hypothesis H2 if i(Xi,...,X„) < nA. Note that if 



A = A = A then a decision on the two hypotheses is based 
on comparing the normahzed log-Hkelihood ratio (w.r.t. n) to 
a single threshold (A), and deciding on hypothesis Hi or H2 
if this normalized log-likelihood ratio is, respectively, above 
or below A. If A < A then one decides on Hi or H2 if 
the normalized log-likelihood ratio is, respectively, above the 
upper threshold A or below the lower threshold A. Otherwise, 
if the normalized log-likelihood ratio is between the upper and 
lower thresholds, then an erasure is declared and no decision 
is taken in this case. 
Let 



,(2) A pn 



P^HXi,. 



.,Xn) < nX 
■ ,Xn) < nX 



(9) 
(10) 



and 



/3(i^ ^P2"(l(Xi,...,X„) >nA) (11) 
Pl^'> ^P^(L{Xi,...,Xn)>nX) (12) 

then an^ and /Sn'^ are the probabilities of either making an 
error or declaring an erasure under, respectively, hypotheses 
Hi and H2; similarly a„ and /3„ are the probabilities of 
making an error under hypotheses Hi and H2, respectively. 

Let 7ri,7r2 G (0, 1) denote the a-priori probabilities of the 
hypotheses Hi and H2, respectively, so 

-2/3i^) 



p(l) 

e,n 



TTia^^^ 



(13) 



is the probability of having either an error or an erasure, and 



(2) 



TTia. 



(2) 



7r2/3, 



(2) 



(14) 



is the probability of error 

Based on the asymptotic results in (|7|i and (O, which 
hold a.s. under hypotheses Hi and H2 respectively, the large 
deviations analysis refers to upper and lower thresholds A and 
A which are kept fixed (i.e., these thresholds do not depend on 
the block length n of the data sequence) where 

~D{P2\\Pi) < X<X < D{Pi\\P2). 

Suppose that instead of having some fixed upper and lower 
thresholds, one is interested to set these thresholds such that 
as the block length n tends to infinity, they tend simultaneously 
to their asymptotic limits in (|7]i and (O, i.e., 

lim A*"^ - ^(-Pi||^2), lim A^") = -DiP2\\Pi). 

Specifically, let rj € ei,e2 > be arbitrary 

fixed numbers, and consider the case where one decides on 

— (n) 

hypothesis Hi if L{Xi, . . . , Xn) > nX , and on hypothesis 
H2 if L{Xi,...,Xn) < nA(") where these upper and lower 
thresholds are set to 

A^"^ =i?(Pi|lP2)-ein-(i-'') 

A(") =-i?(P2||Pi)+e2«"('"''^ 

so that they approach, respectively, the relative entropies 
-D(Pi||P2) and —D{P2\\Pi) in the asymptotic case where 
the block length n of the data sequence tends to infinity. 



Accordingly, the conditional probabilities in (l9])-(fT2b are 

modified so that the fixed thresholds A and A are replaced 

— (") 

with the above block-length dependent thresholds A and 
A*-"-*, respectively. The moderate deviations analysis for binary 
hypothesis testing studies the probability of an error event and 
the probability of a joint error and erasure event under the two 
hypotheses, and it studies the interplay between each of these 
probabilities, the block length n, and the related thresholds 
that tend asymptotically to the limits in (|7| and ^ when the 
block length tends to infinity. 

In light of the discussion in Section III-CI on the MDP for 
i.i.d. RVs and the discussion of its relation to Theorem |2] (see 
Appendix [All, and also motivated by the three recent works in 
|1, Section 4.3], |4l and |TT|, we proceed to consider in the 
following moderate deviations analysis for binary hypothesis 
testing. Our approach for this kind of analysis is different, and 
it relies on concentration inequalities for martingales. 

In the following, we analyze the probability of a joint error 
and erasure event under hypothesis Hi, i.e., derive an upper 
bound on an'' in (|9]l. The same kind of analysis can be adapted 
easily for the other probabilities in (fT0ll-(fT2]l. 

Under hypothesis Hi, let us construct the martingale se- 
quence {Uk, -7^/c}5J=o where J-'o Q J^i ^ ■ ■ ■ J'n is the filtration 

J-o = {0,r!}, Tk = o{Xi,...,X^), VA: e 

and 



For every fc e {0, 



pn[L(Xl,...,X„) I J-fc]. 



(15) 



k 



^ P2{X, 



In 



1=1 



Pi(X, 



ti P^iX., 



P2{X,) 

{n-k)D{Pi\\P2). 



El 

(=fe+i 



In particular 



U^ = nD{Pi\\P2), 
Pi{X^ 



1=1 

and, for every k e { 1 , • • ■ 

Uk - Uk-i 



P2{X,) 



L{Xi 



, X„ 



In 



Pi(^fe) 



Let 



di 



max 



In 



P2{Xk) 

Pi{x) 



D{Pl\\P2)- 



P2{X) 



-D{Pl\\P2) 



(16) 
(17) 

(18) 
(19) 



so c?i < 00 since by assumption the alphabet set X is finite, 
and Pi (a;), P2(a;) > for every x e X. From ( fTsT i and ( fT9l ), 
\Uk — Uk-i\ < di a.s. for every k e {1, . . . , n}, and due to 
the statistical independence of {Xi} 

\2 



Epi^ [{Uk - Uk- 



In 



•^fe-ij 

P2{X) 



D{Pl\\P2) 



(20) 



Let El > and G (5,1) be two arbitrarily fixed numbers. 
Then, under hypothesis Hi, it follows from Theorem |2] and 
the above construction of a martingale that 



Pl^{Un-Uo < -em-' 



< exp —nD 



71 



71 



where 



1 + 71 
(1-')) 



-71. 



di 



(21) 



(22) 



with di and o-^ from ([T9]l and ( |20l l. 

In the following, we will make use of the following lemma; 



Lemma 1: 
(1 + u)ln(l + u) > 



u e [-1,0] 



lil „ u > 



(23) 



where at u = —1, the left-hand side is defined to be zero (it 
is the limit of this function when u — > — 1 from above). 

Proof: The proof follows by elementary calculus. ■ 
From (|22] | and the inequality in Lemma [T] it follows that 



D 



> 



Oi + 71 



71 



71 



1 + 71 "1 



71 



1 + 71 



71 



1 

71 



271 



67?(l + 7i) 



e^ n 



-2(l-r,) 



2a^ 



eidi 



Serf (1 +71) ni-'' 



provided that 6{ < 1 (which holds for n > uq for some 
"-0 = noiv^^ijdi) e N that is determined from (l22l i). By 
substituting this lower bound on the divergence into (ISTT l. it 
follows that 



Pl"(i(^l, . . . , ^n) < nD(Pi||P2) - em") 



< exp 



£2 



1 



1 



2ct2 V 30-2(1 + 71) ,^l-'? 
Consequently, in the limit where n tends to infinity 

=-2 

lim n^-^'Jln a^.^' < 

n^oo 2ct^ 



(24) 



(25) 



with in (|20] |. From the analysis in Section III-CI and 
Appendix lAl it follows that the inequality for the asymptotic 
limit in ( |25] | holds in fact with equality. To verify this, consider 
the real-valued sequence of i.i.d. RVs 



Y,=\n 



P2{X, 



-DiPi\\P2), 1 = 1,. 



that, under hypothesis Hi, have zero mean and variance af. 
Since, by assumption, the sequence ^re i.i.d., then 



LiXi,...,Xn)-nDiPi\\P2) = J2^'^ 



(26) 



and it follows from the one-sided version of the MDP in (|6|l 
that indeed (l25T l holds with equality. Moreover, Theorem |2] 
provides, via the inequality in (|24] |. a finite-length result that 
enhances the asymptotic result for n ^ 00. 

In the considered setting of moderate deviations analysis for 
binary hypothesis testing, the upper bound on the probability 
an'' in (I24I 1. which refers to the probability of either making 
an error or declaring an erasure (i.e., making no decision) 
under the hypothesis Hi, decays to zero sub-exponentially 
with the length n of the sequence. As mentioned above, based 
on the analysis in Section lTl-Cl and AppendixlA] the asymptotic 
upper bound in (l25T l is tight. A completely similar moderate- 
deviations analysis can be also performed under the hypothesis 
H2. Hence, a sub-exponential scaling of the probability /Si^^ in 
(fTTT i of either making an error or declaring an erasure (where 
the lower threshold A is replaced with A'"') also holds under 
the hypothesis H2- These two sub-exponential decays to zero 
for the probabilities an'' and l3n \ under hypothesis Hi or H2 
respectively, improve as the value of ry G (^,1) is increased. 
On the other hand, the two exponential decays to zero of the 

(2) (2) 

probabilities of error (i.e., an and /?„ under hypothesis Hi 
or H2, respectively) improve as the value of 77 G (|,1) is 
decreased; this is due to the fact that, for a fixed value of n, the 
margin which serves to protect us from making an error (either 
under hypothesis Hi or H2) is increased by decreasing the 
value of 77 as above (note that by reducing the value of 77 for a 
fixed n, the upper and lower thresholds A*""' and A^"^ are made 
closer to D{Pi\\P2) from below and to -D{P2\\Pi) from 
above, respectively, which therefore increases the margin that 
is used for protecting one from making an erroneous decision). 
This shows the existence of a tradeoff, in the choice of the 
parameter 77 G (|, 1), between the probability of error and the 
joint probability of error and erasure under either hypothesis 
Hi or H2 (where this tradeoff exists symmetrically for each 
of the two hypotheses). 

In |4J and ||17] , the authors consider moderate deviations 
analysis for channel coding over memoryless channels. In 
particular, |4 Theorem 2.2] and [T7 , Theorem 6] indicate on 
a tight lower bound (i.e., a converse) to the asymptotic result 
in (l25T l for binary hypothesis testing. This tight converse is 
indeed consistent with the asymptotic result of the MDP in Q 
for real-valued i.i.d. random variables, which implies that the 
asymptotic upper bound in ( 1251 ). obtained via the martingale 
approach with the refined version of Azuma's inequality in 
Theorem |2l holds indeed with equality. Note that this equality 
does not follow from Azuma's inequality, so its refinement 
was essential for obtaining this equality. The reason is that, 
due to Appendix |A] the upper bound in ( |25] | that is equal to 

— ^2- is replaced via Azuma's inequality by the looser bound 

— ^ (note that, from ( fT9b and (l20b . cti < rfi where in general 
(Ji may be significantly smaller than di). 



IV. Summary 

This paper is focused on the moderate deviations analysis 
of binary hypothesis testing. The analysis is based on a con- 
centration inequality for discrete-parameter martingales with 
bounded jumps, which forms a refined version of Azuma's 
inequality (see lilO. Corollary 2.4.7]). The relation of this 
concentration inequality to the moderate deviations principle 
for i.i.d. random variables is considered. This paper presents in 
part the work in Iil9j . and it exemplifies the use of a refinement 
of Azuma's inequaUty in an information-theoretic aspect. 
Further information-theoretic applications are considered in, 
e.g., [20,1 and |, , 24 J. The slides are available in |.21.|. 

Acknowledgment: One of the reviewers pointed out that 
the moderate deviations analysis in this work can be done 
alternatively by relying on results, e.g., from |3| or |18|. We 
thank the reviewer for this note, and we currently study this 
line of work. 

Appendix A 

Analysis Related to the Moderate Deviations 
Principle For i.i.d. RVs (See Section HI-CI) 

It is demonstrated in the following that, in contrast to 
Azuma's inequality. Theorem |2] provides an upper bound on 



for every n G N, and therefore (since, from (O, — 



> an''j for a > 0, which coincides with 
the correct asymptotic result in (|6]l. It is proved under the 
further assumption that there exists some constant d > 
such that \Xk\ < d a.s. for every k £ N (since the RVs 
{Xk} are assumed to be i.i.d., it is sufficient to require it for 
k — 1). Let us define the martingale sequence {Sk, J^k}k^o 
where Sk = EiLi J^k = (7{Xi, . . . ,Xk) for every 

ke {!,..., n} with So = and Jq = {0, J^}- 

1 ) Analysis related to Azuma 's inequality: The martingale 
sequence {Sk,J'k}^=f) has uniformly bounded jumps, where 
IS**,, — Sk-i \ = \Xk\ < d a.s. for every k G {1,. . . , n}. Hence 
it follows from Azuma's inequality that, for every a > 0, 



^{\Sn\ > an") < 2exp — 
and therefore 

lim n^-^'' \n¥(\Sn\ > an'') 



2^2 



< 



a 

2^2- 



(27) 



This differs from the limit in (|6|l where cr^ jg replaced by d^, 
so Azuma's inequality does not provide the correct asymptotic 
result in ^ (unless ~ d^, i.e., \Xk \ ~ d a.s. for every k). 

2) Analysis related to Theorem ^ From Theorem |2] it 
follows that for every a > 0, 



V{\Sn\ > an'') < 2exp -nD 



1 



1+7 



where 7 is introduced in ([3]), and 6' is given by 
d 

due to the definition of S in ((Sj. Hence, it follows that 



(28) 



|5„| 



< 2 exp - 



27 



37d 



+ . . 



lim ni-2r, inPd^-^l > an'') < 



7 

'2a2- 



(29) 



Hence, this bound coincides with the exact limit in (|6]l. 
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